Skip to content

Conversation

@harrisonGPU
Copy link
Contributor

@harrisonGPU harrisonGPU commented May 31, 2025

Allow freeze to sink through fmul by treating it as a non-poison-generating op
when operands are not poison.

Adding ISD::FMUL to AllowMultipleMaybePoisonOperands lets DAG combine
push freeze through fmul. This helps expose patterns like fmul+fadd for FMA fusion.

When rebuilding the node, we drop flags like nnan/ninf/nsz that imply poison,
but keep contract, reassoc, afn, and arcp.

Closes: #141622

@llvmbot
Copy link
Member

llvmbot commented May 31, 2025

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-selectiondag

Author: Harrison Hao (harrisonGPU)

Changes

Avoid blocking FMA formation on freeze(fmul x, y). Enable combining patterns like:

  • freeze(x * y) + z → fma(x, y, z)

  • z + freeze(x * y) → fma(x, y, z)

  • freeze(x * y) - z → fma(x, y, -z)

  • z - freeze(x * y) → fma(-x, y, z)

This improves precision and performance in common numerical code.

Closes: #141622


Full diff: https://github.com/llvm/llvm-project/pull/142250.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+46)
  • (added) llvm/test/CodeGen/AMDGPU/fold-freeze-fmul-to-fma.ll (+54)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index be2209a2f8faf..ea5b44da9e48b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -16729,6 +16729,28 @@ SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
     }
   }
 
+  // fold (fadd (freeze (fmul x, y)), z) -> (fma x, y, z).
+  if ((Options.UnsafeFPMath || N->getFlags().hasAllowContract()) &&
+      N0.getOpcode() == ISD::FREEZE) {
+    SDValue FrozenMul = N0.getOperand(0);
+    if (matcher.match(FrozenMul, ISD::FMUL) && isContractableFMUL(FrozenMul)) {
+      SDValue X = FrozenMul.getOperand(0);
+      SDValue Y = FrozenMul.getOperand(1);
+      return matcher.getNode(PreferredFusedOpcode, SL, VT, X, Y, N1);
+    }
+  }
+
+  // fold (fadd x, (freeze (fmul y, z))) -> (fma y, z, x)
+  if ((Options.UnsafeFPMath || N->getFlags().hasAllowContract()) &&
+      N1.getOpcode() == ISD::FREEZE) {
+    SDValue FrozenMul = N1.getOperand(0);
+    if (matcher.match(FrozenMul, ISD::FMUL) && isContractableFMUL(FrozenMul)) {
+      SDValue X = FrozenMul.getOperand(0);
+      SDValue Y = FrozenMul.getOperand(1);
+      return matcher.getNode(PreferredFusedOpcode, SL, VT, X, Y, N0);
+    }
+  }
+
   // More folding opportunities when target permits.
   if (Aggressive) {
     // fold (fadd (fma x, y, (fpext (fmul u, v))), z)
@@ -17006,6 +17028,30 @@ SDValue DAGCombiner::visitFSUBForFMACombine(SDNode *N) {
     }
   }
 
+  // fold (fsub (freeze (fmul x, y)), z) -> (fma x, y, (fneg z))
+  if ((Options.UnsafeFPMath || N->getFlags().hasAllowContract()) &&
+      N0.getOpcode() == ISD::FREEZE) {
+    SDValue FrozenMul = N0.getOperand(0);
+    if (matcher.match(FrozenMul, ISD::FMUL) && isContractableFMUL(FrozenMul)) {
+      SDValue X = FrozenMul.getOperand(0);
+      SDValue Y = FrozenMul.getOperand(1);
+      SDValue NegZ = matcher.getNode(ISD::FNEG, SL, VT, N1);
+      return matcher.getNode(PreferredFusedOpcode, SL, VT, X, Y, NegZ);
+    }
+  }
+
+  // fold (fsub z, (freeze(fmul x, y))) -> (fma (fneg x), y, z)
+  if ((Options.UnsafeFPMath || N->getFlags().hasAllowContract()) &&
+      N1.getOpcode() == ISD::FREEZE) {
+    SDValue FrozenMul = N1.getOperand(0);
+    if (matcher.match(FrozenMul, ISD::FMUL) && isContractableFMUL(FrozenMul)) {
+      SDValue X = FrozenMul.getOperand(0);
+      SDValue Y = FrozenMul.getOperand(1);
+      SDValue NegX = matcher.getNode(ISD::FNEG, SL, VT, X);
+      return matcher.getNode(PreferredFusedOpcode, SL, VT, NegX, Y, N0);
+    }
+  }
+
   auto isReassociable = [&Options](SDNode *N) {
     return Options.UnsafeFPMath || N->getFlags().hasAllowReassociation();
   };
diff --git a/llvm/test/CodeGen/AMDGPU/fold-freeze-fmul-to-fma.ll b/llvm/test/CodeGen/AMDGPU/fold-freeze-fmul-to-fma.ll
new file mode 100644
index 0000000000000..75fe67e743c03
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fold-freeze-fmul-to-fma.ll
@@ -0,0 +1,54 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck %s -check-prefix GFX11
+
+define float @fma_from_freeze_mul_add_left(float %x, float %y) {
+; GFX11-LABEL: fma_from_freeze_mul_add_left:
+; GFX11:       ; %bb.0: ; %bb
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_fma_f32 v0, v0, v1, 1.0
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+bb:
+  %mul = fmul contract float %x, %y
+  %mul.fr = freeze float %mul
+  %add = fadd contract float %mul.fr, 1.000000e+00
+  ret float %add
+}
+
+define float @fma_from_freeze_mul_add_right(float %x, float %y) {
+; GFX11-LABEL: fma_from_freeze_mul_add_right:
+; GFX11:       ; %bb.0: ; %bb
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_fma_f32 v0, v0, v1, 1.0
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+bb:
+  %mul = fmul contract float %x, %y
+  %mul.fr = freeze float %mul
+  %add = fadd contract float 1.000000e+00, %mul.fr
+  ret float %add
+}
+
+define float @fma_from_freeze_mul_sub_left(float %x, float %y) {
+; GFX11-LABEL: fma_from_freeze_mul_sub_left:
+; GFX11:       ; %bb.0: ; %bb
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_fma_f32 v0, v0, v1, -1.0
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+bb:
+  %mul = fmul contract float %x, %y
+  %mul.fr = freeze float %mul
+  %sub = fsub contract float %mul.fr, 1.000000e+00
+  ret float %sub
+}
+
+define float @fma_from_freeze_mul_sub_right(float %x, float %y) {
+; GFX11-LABEL: fma_from_freeze_mul_sub_right:
+; GFX11:       ; %bb.0: ; %bb
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_fma_f32 v0, -v0, v1, 1.0
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+bb:
+  %mul = fmul contract float %x, %y
+  %mul.fr = freeze float %mul
+  %sub = fsub contract float 1.000000e+00, %mul.fr
+  ret float %sub
+}

@RKSimon
Copy link
Collaborator

RKSimon commented May 31, 2025

By the looks of it SelectionDAG::canCreateUndefOrPoison doesn't handle fp ops at all - should we start with fixing those to more closely match ValueTracking llvm::canCreateUndefOrPoison?

@harrisonGPU
Copy link
Contributor Author

By the looks of it SelectionDAG::canCreateUndefOrPoison doesn't handle fp ops at all - should we start with fixing those to more closely match ValueTracking llvm::canCreateUndefOrPoison?

Thanks for the pointer. After rereading the LLVM Language Reference Manual I agree that
SelectionDAG::canCreateUndefOrPoison should return false for plain FP ops (fadd, fmul, …):
by default they propagate poison but never create it.
The only time they can generate new poison is when the instruction carries the nnan or ninf
(or the umbrella fast) fast-math flags, because those flags require a NaN/Inf result to become poison.

The LangRef notes for every FP op include:

“This instruction can also take any number of
fast-math flags, which are optimization hints to enable otherwise
unsafe floating-point optimizations.”

What do you think of this?

References

@RKSimon
Copy link
Collaborator

RKSimon commented Jun 1, 2025

SelectionDAG::canCreateUndefOrPoison checks for hasPoisonGeneratingFlags - which seems to cover nonans/noinfs - but we'd definitely need to have test coverage for this.

@harrisonGPU
Copy link
Contributor Author

SelectionDAG::canCreateUndefOrPoison checks for hasPoisonGeneratingFlags - which seems to cover nonans/noinfs - but we'd definitely need to have test coverage for this.

Thanks a lot! I’ll add a test case with nnan / ninf to verify that canCreateUndefOrPoison correctly returns true when required.
I'm also preparing a new PR with lit tests ensuring that SelectionDAG::canCreateUndefOrPoison returns false for plain FP ops without poison-generating flags.

@harrisonGPU harrisonGPU changed the title [DAGCombiner] Fold freeze(fmul) + fadd/fsub into FMA combine [DAGCombiner] Fold freeze(fmul) + fadd/fsub into FMA when 'contract' is present Jun 4, 2025
@harrisonGPU harrisonGPU changed the title [DAGCombiner] Fold freeze(fmul) + fadd/fsub into FMA when 'contract' is present [DAGCombiner] Allow freeze(fmul) + fadd/fsub → FMA when using contract Jun 4, 2025
@harrisonGPU harrisonGPU requested a review from nikic June 4, 2025 04:59
@harrisonGPU
Copy link
Contributor Author

harrisonGPU commented Jun 4, 2025

Hi, I want to implement support for folding freeze(fmul) + fadd/fsub into FMA when using the contract flag. This is important for graphics GPU compilers because we set nnan by default. Could you please give me some suggestions? I have already updated the PR and lit tests. @jayfoad @nikic @RKSimon @arsenm :-)

@RKSimon
Copy link
Collaborator

RKSimon commented Jun 4, 2025

DAGCombiner::visitFREEZE has a list of "AllowMultipleMaybePoisonOperands" opcodes where it will try to force the freeze through the op and onto all its operandss - I've never been sure of the requirements for this list (or why we don't perform it for all non-poison-generating ops) - but if its permissible to add FMUL to it then that might be a much simpler solution.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct as implemented, because you're losing the freeze. I think @RKSimon's suggestion is the correct way to approach this.

@harrisonGPU
Copy link
Contributor Author

harrisonGPU commented Jun 4, 2025

Thank you for your suggestions ! @RKSimon @nikic , I really appreciate it, I will try to implement!

@harrisonGPU harrisonGPU changed the title [DAGCombiner] Allow freeze(fmul) + fadd/fsub → FMA when using contract [DAGCombiner] Allow freeze to sink through fmul by adding it to AllowMultipleMaybePoisonOperands Jun 7, 2025
@harrisonGPU
Copy link
Contributor Author

harrisonGPU commented Jun 7, 2025

Hi @RKSimon @nikic @jayfoad @arsenm , I've added FMUL to AllowMultipleMaybePoisonOperands.
When you have time, could you please take a look and let me know if you have any suggestions?

@harrisonGPU harrisonGPU requested a review from nikic June 7, 2025 10:44
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The flag preservation matches the middle-end behavior (https://llvm.godbolt.org/z/a97Yr1Eh5). This assumes that rewrite-based FMF is transparent to freeze.

@harrisonGPU
Copy link
Contributor Author

Thanks !

@harrisonGPU harrisonGPU merged commit 102dfa8 into llvm:main Jun 8, 2025
7 checks passed
@harrisonGPU harrisonGPU deleted the dag-fma branch June 8, 2025 11:39
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 8, 2025

LLVM Buildbot has detected a new failure on builder lldb-arm-ubuntu running on linaro-lldb-arm-ubuntu while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/17180

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: tools/lldb-dap/repl-mode/TestDAP_repl_mode_detection.py (1202 of 3238)
UNSUPPORTED: lldb-api :: tools/lldb-dap/restart/TestDAP_restart_runInTerminal.py (1203 of 3238)
UNSUPPORTED: lldb-api :: tools/lldb-dap/runInTerminal/TestDAP_runInTerminal.py (1204 of 3238)
UNSUPPORTED: lldb-api :: tools/lldb-dap/save-core/TestDAP_save_core.py (1205 of 3238)
PASS: lldb-api :: tools/lldb-dap/progress/TestDAP_Progress.py (1206 of 3238)
PASS: lldb-api :: tools/lldb-dap/send-event/TestDAP_sendEvent.py (1207 of 3238)
PASS: lldb-api :: tools/lldb-dap/source/TestDAP_source.py (1208 of 3238)
UNSUPPORTED: lldb-api :: tools/lldb-dap/stackTrace-x86/TestDAP_source_x86.py (1209 of 3238)
PASS: lldb-api :: tools/lldb-dap/restart/TestDAP_restart.py (1210 of 3238)
UNSUPPORTED: lldb-api :: tools/lldb-dap/stackTrace/subtleFrames/TestDAP_subtleFrames.py (1211 of 3238)
FAIL: lldb-api :: tools/lldb-dap/server/TestDAP_server.py (1212 of 3238)
******************** TEST 'lldb-api :: tools/lldb-dap/server/TestDAP_server.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --arch armv8l --build-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --cmake-build-type Release /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/tools/lldb-dap/server -p TestDAP_server.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision 102dfa8a487a1c44a95a225825039d282be712a0)
  clang revision 102dfa8a487a1c44a95a225825039d282be712a0
  llvm revision 102dfa8a487a1c44a95a225825039d282be712a0
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
========= DEBUG ADAPTER PROTOCOL LOGS =========
no log file available
========= END =========
FAIL: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_server_interrupt (TestDAP_server.TestDAP_server)
========= DEBUG ADAPTER PROTOCOL LOGS =========
no log file available
========= END =========
========= DEBUG ADAPTER PROTOCOL LOGS =========
no log file available
========= END =========
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_server_port (TestDAP_server.TestDAP_server)
DAP session (client_1) error: Bad file descriptor
========= DEBUG ADAPTER PROTOCOL LOGS =========
no log file available
========= END =========
========= DEBUG ADAPTER PROTOCOL LOGS =========
no log file available
========= END =========
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_server_unix_socket (TestDAP_server.TestDAP_server)
======================================================================
FAIL: test_server_interrupt (TestDAP_server.TestDAP_server)

tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
…MultipleMaybePoisonOperands (llvm#142250)

Allow freeze to sink through fmul by treating it as a
non-poison-generating op
when operands are not poison.

Adding `ISD::FMUL` to `AllowMultipleMaybePoisonOperands` lets DAG
combine
push freeze through fmul. This helps expose patterns like `fmul+fadd`
for `FMA` fusion.

When rebuilding the node, we drop flags like nnan/ninf/nsz that imply
poison,
but keep contract, reassoc, afn, and arcp.


Closes: llvm#141622
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AMDGPU llvm:SelectionDAG SelectionDAGISel as well

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AMDGPU] InstCombine moving freeze instructions breaks FMA formation

5 participants